Genome sequences are the basis for studies of chromosome structure, the distribution of repetitive and coding sequences, and gene identification and annotation. Genomic information from different species permits comparative phylogenetic analyses to study the relationships between species, their origins, and evolutionary features. Plant genome dynamics and processes, such as amplification of transposable elements, gene tandem duplication, genomic rearrangements, and mutations, can lead to everything from single nucleotide polymorphisms (SNPs), and gene presence/absence variants (PAVs) to structural variants (SVs) providing the raw material for natural selection, phenotypic diversity and adaptation. The high level of genomic variation observed has led to the realization that a single reference genome does not represent diversity within a species and has led to the expansion of the concept of pan-genome. A pan-genome represents the genomic diversity of a species, including the core genes found in all individuals, as well as variable genes that are absent in some individuals.
Fig. 1. Schematic diagram showing the concept, construction methods, and research focus of pan-genomics. (Li et al., 2022)
Based on a long-read sequencing technology platform, CD Genomics provides specialized plant pan-genome sequencing services to more accurately represent the genetic variation present in a species. By combining genomic data from multiple germplasm, our pangenomes can detect and annotate complex DNA polymorphisms, such as structural variation (SV). Currently, we have sequenced the pan-genomes of nearly dozens of species, including rice, soybean, maize, wheat, cucumber, chickpea, and tomato, to help our clients with crop breeding, adaptation, and evolution.
CD Genomics offers the following methods for constructing pan-genomes, enabling you to build comprehensive pan-genomes for your plant species of interest. We aim to integrate multiple genomes from a subset of germplasm and give biologists easy access to the integrated genetic information. Based on project needs, we will choose the most cost-effective solution for you.
De Novo Assembly | Iterative Assembly | Graph-Based Assembly | |
---|---|---|---|
Features | The most straightforward approach to constructing a pangenome is to assemble the genomes of multiple samples from scratch, followed by comparative analyses to detect all variant types and characterize identified genes as core or dispensable. | Starting with the construction of a single reference genome, reads from other samples are then sequentially mapped to the reference genome. Unmapped reads are assembled and added to the reference genome to construct a pan-genome of non-redundant sequences. | Graph-based assembly strategies for pan-genome construction use graphs to represent diversity and variation relative to a reference genome. |
Advantages | Handles repetitive regions well | The cost of this method is lower than the de novo assembly method because each sample can be sequenced at a lower sequencing depth, allowing hundreds of samples to be combined. | Graph-based pan-genomes show significant improvements in mitigating reference bias compared to traditional linear genomes. |
Disadvantages | Requires high depth sequencing reads to construct highly contiguous and accurate genome assemblies, which is costly for large plant genomes and hundreds of reference genomes for a single species). | Because there is no assembly process, iterative assembly methods have difficulty with genomes containing large numbers of repetitive regions and are unable to detect large SVs that are not spanned by a single short read segment. | Currently, the construction and application of graph-based pan-genomes are limited by the complexity of plant genomes, such as high repeat content and polyploidy, as well as the lack of common downstream analysis and graph visualization tools. |
Fig. 2. CD Genomics' plant pan-genome sequencing service process.
CD Genomics, as an industry leader in agricultural genome solutions, will provide you with advanced pan-genome sequencing platforms and sequencing solutions for plants. If you are interested, please feel free to contact us.
Reference
For any general inquiries, please fill out the form below.
CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.